Skip to content

human_text: escape inner double quotes and backslashes in change and unchanged values#48

Open
HrachShah wants to merge 2 commits into
simonw:mainfrom
HrachShah:fix/human-text-escape-quotes-in-values
Open

human_text: escape inner double quotes and backslashes in change and unchanged values#48
HrachShah wants to merge 2 commits into
simonw:mainfrom
HrachShah:fix/human-text-escape-quotes-in-values

Conversation

@HrachShah

Copy link
Copy Markdown

csv-diff's human_text() rendered change lines and the unchanged summary by wrapping prev/current values in literal "...", but did not escape characters inside the value. A row with a name like hello "world" rendered as name: "hello "world"" => "goodbye "cruel" world" - the inner double quotes were indistinguishable from the wrapping quotes, so a downstream reader could not parse the output. The same ambiguity applied to backslash characters.

The new _format_quoted() helper centralises the rendering: stringify the value, escape backslashes first, then double quotes, then wrap in one pair of double quotes. The change lines and the unchanged summary both go through it now, so:

  • hello "world" renders as "hello \\"world\\"" (unambiguous)
  • back\slash renders as "back\\\\slash" (unambiguous)
  • Cleo renders as "Cleo" (matches the existing convention for change values)

human_row() (used for added/removed rows) keeps its plain key: value format, since those rows are written line-by-line and don't have the value-pair ambiguity that the change line does.

python3 -m pytest tests/: 26 passed (24 baseline + 2 new regression tests).

Zo Bot added 2 commits June 15, 2026 19:36
…r runs csv-diff against an empty file, csv.reader returns no rows and the previous code let StopIteration bubble out of next(fp), producing a confusing traceback at the top of the call stack with no indication that the input was empty; the new try/except translates StopIteration into a typed ValueError with a descriptive message so the CLI shows 'CSV input is empty (no header row found)' and downstream loaders / Click error handling can react to it explicitly
…es internal double quotes and backslashes

human_text() wrapped prev/current values in literal '"..."' on the change line
and the unchanged row summary, but only used plain str(value) in human_row()
for added/removed rows. A value containing a double quote, e.g. "hello \"world\"",
rendered as 'name: "hello "world"" => "goodbye "cruel" world"' - the inner
quotes were indistinguishable from the wrapping quotes, so a downstream reader
could not tell where each value started or ended. The same ambiguity applied
to backslash characters (\\ rendered as \\ inside the quoted value).

The new _format_quoted() helper centralises the rendering: stringify, escape
backslashes first, then double quotes, and wrap in a single pair of double
quotes. The change lines and the unchanged summary now both go through it, so
'Cleo' renders as '"Cleo"' (matching the existing convention for change
values) and 'hello "world"' renders as '"hello \\"world\\""' instead of
the previous ambiguous form. Non-string values (ints, None) stringified via
str() keep the existing behaviour for change lines.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant